Enhancing co-seismic landslide susceptibility, building exposure, and risk analysis through machine learning

Landslides are devastating natural disasters that generally occur on fragile slopes. Landslides are influenced by many factors, such as geology, topography, natural drainage, land cover, rainfall and earthquakes, although the underlying mechanism is too complex and very difficult to explain in detail. In this study, the susceptibility mapping of co-seismic landslides is carried out using a machine learning approach, considering six districts covering an area of 12,887 km2 in Nepal. Landslide inventory map is prepared by taking 23,164 post seismic landslide data points that occurred after the 7.8 MW 2015 Gorkha earthquake. Twelve causative factors, including distance from the rupture plane, peak ground acceleration and distance from the fault, are considered input parameters. The overall accuracy of the model is 87.2%, the area under the ROC curve is 0.94, the Kappa coefficient is 0.744 and the RMSE value is 0.358, which indicates that the performance of the model is excellent with the causative factors considered. The susceptibility thus developed shows that Sindhupalchowk district has the largest percentage of area under high and very high susceptibility classes, and the most susceptible local unit in Sindhupalchowk is the Barhabise municipality, with 19.98% and 20.34% of its area under high and very high susceptibility classes, respectively. For the analysis of building exposure to co-seismic landslide susceptibility, a building footprint map is developed and overlaid on the co-seismic landslide susceptibility map. The results show that the Sindhupalchowk and Dhading districts have the largest and smallest number of houses exposed to co-seismic landslide susceptibility. Additionally, when conducting a risk analysis based on susceptibility mapping, as well as considering socio-economic and structural vulnerability in Barhabise municipality, revealed that only 106 (1.1%) of the total 9591 households, were found to be at high risk. As this is the first study of co-seismic landslide risk study carried out in Nepal and covers a regional to the municipal level, this can be a reference for future studies in Nepal and other parts of the world and can be helpful in planning development activities for government bodies.

www.nature.com/scientificreports/generally not recorded by the government body.On the other hand, more than 25,000 landslides were triggered by the 2015 Gorkha earthquake (7.8 Mw), and its aftershock in central Nepal 14 infers the importance of coseismic landslide studies in reducing disaster risk.The development of a proper susceptibility map considering effective causative factors and their use in the analysis and design of mitigation help to minimize the loss from landslides 8 .Many studies have been carried out on landslide susceptibility mapping along the Himalayas, but they are limited to a small spatial area and do not consider the direct influence of an earthquake 4,15 .The complex mechanism of landslide occurrence due to earthquakes is not well understood, and there are very limited studies that have considered landslide susceptibility analysis considering earthquakes.
Different approaches have been implemented for landslide susceptibility mapping, such as different kinds of statistical methods.Models like weight of evidence and frequency ratio are bivariate statistical methods and logistic regression is a multivariate statistical method.Since the past decade, artificial intelligence (AI) methods for instance, machine learning (ML) methods for example: artificial neural network (ANN), support vector machine (SVM), random forest (RM), extremely randomized trees (ERT), extreme gradient boosting (XGB), and classification and regression trees (CART) have become popular methods for landslide susceptibility mapping 16,17 .In the conventional method of landslide susceptibility modeling, including the heuristic approach and the knowledge-based approach, a high level of expertise is needed along with a strong knowledge of geomechanical and geomorphological factors influencing the occurrences of landslides.The methods are cumbersome to apply.Similarly, physical-based modeling also needs more time to compute the relation between landslides and different parameters of slope-forming materials.These limitations can be mitigated by applying ML models for susceptibility mapping 18 .Due to the complex relationships between landslide conditioning factors and landslide susceptibility, ML methods are superior to conventional approaches in these domains 17,19 .Since ML algorithms are robust, can handle missing values, and can optimize the fitting of a model, i.e., avoid overfitting, ensemble ML techniques such as RF, XGB, ERT, and CART are widely used in the field of data classification and regression 20,21 .
Some studies are carried out in different part of Nepal using different statistical methods 22,23 like index entropy, certainty factor method, frequency ratio etc.A few machine learning methods are applied for landslide susceptibility mapping in watershed level study in some districts [24][25][26][27][28] .In addition to being suitable for large areas, they do not require that the environmental factors be distributed normally.Pyakurel et al. 21performed landslide susceptibility mapping in regional level employing different ML models, and concluded that ERT model has best prediction capability among the models employed.Moreover, ERT remove the discretization threshold values through optimization for node splitting which makes it robust and more accurate 29,30 .Similarly, other studies have reported that performance of ERT is better than other ML models form their studies for flood and landslide susceptibility mapping 29,31 Therefore, ERT model is applied for the susceptibility mapping of co-seismic landslides distributed in a large area.
A co-seismic landslide susceptibility map, when considered from the perspective of disaster risk reduction and management, can play a significant role in associated risk reduction and management by providing information on the likelihood of landslide occurrence.When landslide-prone locations are well recognized, it is simple to develop risk reduction and management plans.By utilizing the different methods to determine the nature and extent of hazard, exposure and vulnerability any associated risk can be assessed 32 .Different studies were carried out using quantitative, qualitative and semi-quantitative risk assessment methods [33][34][35][36][37] .Semi-quantitative method was used by Althuwaynee et al. 37 where vulnerability was assumed to be constant and the product of hazard probability and exposure was considered landslide risk.Bell et al. 36 used quantitative risk assessment using all three components, i.e., hazard, vulnerability and exposure, to prepare risk map.By combining the probability of hazard and vulnerability, landslide risk map was prepared 38 .
Buildings in the higher risk class are exposed to higher risk.While the buildings in lower risk class have minimal risk.This crucial information on the exposure of buildings to disasters is important for developing resilient communities and useful in disaster risk reduction of the community.Therefore, this study focuses on preparing a susceptibility map on regional and municipal scales with building exposure analysis and risk map for a municipal level to highlight the importance of co-seismic landslide risk study.

Study area
With a total area of 12,886.95km 2 , the research area is located in Nepal's central region between 27.47° N to 28.75° N and 84.42° E to 86.55° E. It belongs to both lesser Himalaya and higher Himalaya geological zones.The Himal Group formation is the main geological formation in the higher altitude area, and it contains highly graded metamorphic rocks such as gneisses, kyanite quartzites, and thin layers of marble.The Ranimatta Formation, which consists of grayish greenish gray phyllites gravelstones with conglomarates and white massice quartzites in the upper part, is the dominant geological unit in the lesser Himalaya.The geomorphology of the study area has been influenced by tectonic activity, weathering, erosion, and slope failures, resulting in steep peaks and deep gorges.The slope of the study area ranges from 0° to 87.98°, with significant elevation changes between 285 and 7954 m.Intense rainfall, dense populations, and earthquakes can all increase the risk of fatal landslides in the area considered 14 .The study area is shown in Fig. 1.

Methodology
The present study analyzed an approach for the mitigation of co-seismic landslide hazards, considering the 23,164 landslides that occurred after the 2015 Gorkha earthquake (7.8 Mw) in Nepal.The distribution of these landslides in relation to the rupture plane of the earthquake is analyzed and is considered one of the major factors for landslide occurrence.Peak ground acceleration is another important factor considered for the analysis of these co-seismic landslides.Nine other factors, including slope, aspect, total curvature, topographic ruggedness index (TRI), distance to rivers, distance to roads, land cover and land use, and geology, are also considered contributing factors for landslide occurrence.Quantum Geographic Information System (QGIS) 39 is used for all geospatial data preparation and calculation.Moreover, one of the machine learning methods, i.e., the extreme randomness tree (ERT), is used for landslide susceptibility mapping, and the results thus obtained are validated by plotting the success rating curves.A building footprint map is created using data from the open street map (OSM), which is then overlaid on the susceptibility map, and zonal statistics for the building footprint are calculated using QGIS.After analyzing the zonal statistics of the buildings, the exposure of the buildings in each susceptibility class is evaluated.Finally, socio-economic and structural vulnerability assessment is carried out using "Technique for Order Preference by Similarity to an Ideal Solution" (TOPSIS) method.To evaluate the landslide risk, the product of probability of landslide and the vulnerability is used.The overall research framework is given in Fig. 2.

Data acquisition and processing
The landslide inventory map containing 23,164 post earthquake landslide polygons was prepared the study of Roback et al. 14 .Each landslide was represented in the analysis by a point at its centroid.Similarly, to consider the non-landslide samples, random points were generated within the study boundary and the points inside landslide polygons were carefully removed.To overcome the imbalance in training and testing around 23,000 points were generated for non-landslide data.The training-to-validation data ratio in most research ranges from 60:40 to 70:30 40,41 .In this study, the ratio is kept at 70:30, which means that out of 23,164 landslides, 16,214 are considered the training set and the rest the validation set. Figure 3 depicts the landslide inventory map with training and validation sets.The study area's digital elevation model (DEM) was obtained from the US Geological Survey (USGS).The downloaded DEM had a grid size of 30 m by 30 m.The slope, aspect, total curvature, topographic ruggedness index (TRI) and distance to rivers were derived using the DEM.Similarly, the layers of distance to faults and geology were created using data from the Department of Mines and Geology (DMG).From USGS, the layer peak ground acceleration (PGA) was obtained.Based on the research of Elliott et al. 40,41 , the layer distance to the rupture plane was calculated.Furthermore, the land cover and land use layer was produced using Sentinel-2 satellite remote sensing data, while the distance to the road layer was produced using International Centre for Integrated Mountain Development (ICIMOD) data.All data collected from various sources are processed using GIS.

Causative factors
The occurrence of landslides is influenced by a wide range of causative factors.Seismic parameters such as ground acceleration, distance from the rupture plane, and other topographic, geological, and hydrological factors all contribute to the occurrence of landslides.Twelve causative factors are taken as major factors contributing to the occurrence of landslides as considered in the previous study 21 .Information about these factors is given in Table 1.

Topographic factors
Under topographic factors, elevation, aspect, slope, total curvature, topographic ruggedness index and distance to rivers are considered.The factor map of topographic factors is shown in Fig. 4. www.nature.com/scientificreports/Elevation Elevation is one of the major factors controlling the rainfall distribution, vegetation cover, ground water and weathering of rocks 43 .These factors directly or indirectly influence the occurrence of landslides.The elevation also plays a role in the occurrence of post-earthquake landslides 14 .Therefore, elevation is considered one of the major causative factors in this study.

Aspect
The aspect is one of the important factors for the occurrence of landslides, as it can influence moisture, precipitation, and solar radiation 2,44,45 which play an important role in groundwater and rock strength, which in turn can play an important role in the occurrence of landslides.The slope aspect and the direction of seismic acceleration may have some association.Therefore, aspect is considered one of the main contributing factors to landslide occurrence.

Slope
Instability arises on slopes due to the self-weight of materials triggered by rainfall and strong ground motion 8,46 ; therefore, susceptibility is directly connected with the surrounding slopes; hence, the slope is one of the critical factors for assessing susceptibility 4,17 .The slope angle in the study area ranges from 0 to 87.98° in the study area considered.

Total curvature
Another factor to consider is curvature, which represents the maximum slope direction.According to Pradhan et al. 47 , concavity is a negative curvature, flatness is a zero curvature, and convexity is a positive curvature.Flow and slide are more likely in flatness than in convex curvature.The total curvature in the study area ranges from − 17.56 to 54.23.

Topographic ruggedness index
The topographic ruggedness index (TRI) measures the elevation difference between adjacent DEM cells.TRI is a morphometric measure that describes the variety of a land surface; it classifies the terrain as smooth or rugged 48 .

Distance to rivers
The distance to the river is the stream proximity from the landslides to the nearest natural drainage.Natural drainages can adversely affect the stability of slopes by toe incisions and by saturating slope-forming materials due to an increase in water level 49,50 .In general, the closer the distance from the river is, the higher the density of the landslide 4,25 .

Seismic factors
Under seismic factors, distance to faults, peak ground acceleration and distance to rupture plane are considered.The factor map of seismic factors is shown in Fig. 5.   www.nature.com/scientificreports/also play a role in the movement of ground water, playing an important role in landslide occurrences 7 .During an earthquake, a fault may act as a barrier for the propagation of seismic waves.

Peak ground acceleration (PGA)
Earthquake characteristics, such as magnitude and depth, play an essential role in co-seismic landslide distribution 51,52 .With the same magnitude of the earthquake, the effect can differ based on the peak ground acceleration and amplification factors 53 .Generally, higher destruction and more landslides are likely to occur with the increase in magnitude of an earthquake and the peak ground acceleration 54 .Hence, peak ground acceleration (PGA) is considered one of the thematic layers for susceptibility analysis in this study.

Distance to the rupture plane
Co-seismic landslides and distance from the rupture plane have a strong positive correlation; in most cases, as distance from the rupture plane increases, the frequency of landslides decreases 55 .A denser distribution of coseismic landslides closer to the rupture plane and a sparse distribution away from the rupture plane were also reported by 56 .Using remote sensing techniques, for instance, interferometric synthetic aperture radar (InSAR) data and synthetic aperture radar (SAR) interferometric analysis, some studies have estimated the rupture plane for the Gorkha earthquake 57,58 .In this study, the rupture plane is determined based on the study by Elliott et al. 42 .

Anthropogenic factors
Under anthropogenic factors, distance to roads, land use and land cover are considered.The factor map of anthropogenic factors is given in Fig. 6.

Distance to roads
The stability of slopes is directly affected by anthropogenic factors such as road excavations 59 .Due to unplanned and nonengineered road construction, the mountainous regions of Nepal have experienced many landslides 7 .A significant number of landslide occurrences around roads after the 2015 Gorkha earthquake were reported by Mcadoo et al. 60 .Therefore, the factor distance to road is considered an important factor for landslide occurrence.

Land use and land cover (LULC)
Landslide occurrence is greatly influenced by LULC; for example, infrastructure development activities such as road excavation and urbanization have an impact on landslide occurrence 45,60 .The LULC factor is classified into seven classes: water bodies, forest, shrubs, cultivated land, barren land, snow/ice cover and built-up area.The spatial resolution of LULC used in this study is 10 m × 10 m 61,62 .

Geology
Geological formation and setting affect the amplification and propagation of the ground motion of seismic waves because this geology plays a significant role in the co-seismic landslide distribution.The Himalayas in Nepal are fragile and have very intricate geology with steep slopes that are prone to landslides during strong ground motion.The details on the geology of the study area are given in Fig. 7.

Building the footprint map
To investigate the buildings' exposure to co-seismic landslides, it is necessary to prepare a building distribution map over the study area and overlay it on the co-seismic landslide susceptibility map.The building footprint data over the study area are taken from open street map (OSM).A total of 477,798 buildings are observed in the  www.nature.com/scientificreports/study area based on OSM data.The buildings are primarily concentrated at lower elevations and along the river basins.To examine the exposure of buildings, the building footprint map (Fig. 8) thus developed is overlayed on the co-seismic landslide susceptibility map.The districts and local units with the highest building exposure to high co-seismic landslide susceptible areas are further analyzed.

Modeling of landslide susceptibility
Modeling of landslide susceptibility is carried out using a machine learning (ML) approach.It is a kind of artificial intelligence (AI) that is used for data analysis, classification, and regression problems.There are two types of ML generally used in data classification problems, i.e., supervised and unsupervised models.By analyzing past data, supervised ML models learn the patterns and relations between the target and feature values and then predict the target pattern based on the input features.In this study, an extremely randomized tree (ERT) model is used for the modeling of co-seismic landslide susceptibility.The extremely randomized tree (ERT) model is the extended version of the random forest (RF) model, which is also known as the extra tree model.It is a well-structured supervised machine learning model that is used for regression and classification of data sets.The ERT model generates more reliable results than the RF model by using all sample data sets during the training process, and on the basis of decision tree learning, ERT is a more advanced model that contains numerous decision trees 63 .
The ERT model has three major advantages.The ERT algorithm was proposed by Geurts et al. 30 .It is similar to random forest algorithm, but it has two major differences.The first considers whole data into the construction of one decision tree with random features sets.The second is that it randomly splits the nodes with features forming a large number of trees.ERT's larger tree size and numerous splits aim to reduce bias, enhancing its generalization performance (Fig. 9).The process of ERT splitting comprises three parameters: the attribute selection strength parameter (K), the minimum number of samples needed to split a node (nmin), which defines the average output amount, and the number of trees (M), which dictates the extent of variance reduction 64 .ERT stands out for its robustness against overfitting, strong resistance to noise, and adeptness in managing high-dimensional datasets without pre-selecting features 64 .
In this paper, the ERT model is applied by using the scikitlearn.ensemblelibrary in Python.The extremely randomized tree classifier model in the scikitlearn.ensemblelibrary is used to quantitatively measure the contribution of each causative factor.

Vulnerability assessment
Vulnerability assessment is a crucial part in understanding the risk and mitigate it.To understand the risk associated with hazard event, event occurrence probability, exposed element and vulnerability need to be quantified 65 .
Risk associated with natural hazard can be defined as the function of hazard, exposure, and vulnerability 66 as shown in Eq. ( 1) where H = hazard, E = exposure, V = vulnerability.In this study, vulnerability and risk assessment was carried out at local level i.e., ward level.The ward level data was collected from national population and housing census 2021 67 .For vulnerability calculation, the structural properties of buildings like outer wall materials, foundation types are considered as a physical parameter.Similarly, for the socio-economic variables, economically active age group from 15 to 59 and economically inactive age group 0-14 and greater than 65 are considered.Female to male ratio and female population are also considered under the socio-economic parameters.Literacy affects the people's reach on disaster information and risk understanding, therefore the literacy parameters are considered as well.
For the total vulnerability score calculation TOPSIS is used.TOPSIS is an improved method for vulnerability score calculation based on the entropy weight; the positive ideal solution and negative ideal solution.Based on the distance from either positive and negative ideal solution, the degree of dispersion is measured.
For TOPSIS analysis, after normalization, entropy weight is calculated.Entropy weight method (EWM) is one of the popular methods in the decision-making process 68 since it relies on the information gained from each index value and uses inherent law to calculate weight for each attribute and weightage according to variation of index 69 .First attributes were normalized using vector normalization as of their cardinality 70,71 .
where a ij represent the original data and i represent the municipality index and j represent the index of attributes.Proportion of each i index for j each attribute was calculated using Eq. ( 4) 72 .
here p ij is known as proportion index whose value in case of zero is replaced by the 0.0001 value to maintain validity for natural log calculation.Thus, obtained p ij is then used to calculate entropy ( ϕ i ) of index i using Eq. ( 5) 72 .
(1) www.nature.com/scientificreports/where = 1 ln(m) , here m is the number of samples in the data and m = 9.Weight of entropy for each attribute is defined by Eq. ( 6) 72 Considering L + and L − are the positive and negative ideal solution, which are the maximum and minimum values in matrix L ( t ij ) as given in Eqs. ( 8) and ( 9) 71 .The positive cardinality is considered for positive solution and negative cardinality is considered for negative ideal solution.
here j + and j − are the attributes index with positive and negative cardinality.
The distance from the ideal solution are determined as d iw for worst and d ib for best scenario as given in Eqs. ( 10) and ( 11) 71 .
For the ranking the distance, weight is calculated as I iw as worst-case weight also called relative closeness using Eq. ( 12) 71,72 .
Using this index ( I iw ) for each alternative, different classes or ranks are prepared, and the score is given based on the rank.In this study, worst case is used to calculate weight, which means that highest index represent most vulnerable and lowest index represent the low vulnerability.

Results and discussions
As shown in Fig. 10, a co-seismic landslide susceptibility map is developed using the twelve input parameters and the ERT ML model.The susceptibility map thus developed is divided into five classes depending on the probability of occurrence of earthquake-induced landslides.The area with a probability of co-seismic landslide occurrence between 0 and 0.2 is classified as a very low susceptibility area.It covers 60.51% of the total study area.This area is considered safe for human activities and infrastructure development.
The area with a probability of co-seismic landslide occurrence between 0.2 and 0.4 is classified as a low susceptibility area.It covers 17.13% of the total study area.The area with a probability of co-seismic landslide occurrence between 0.4 and 0.6 is classified as a moderately susceptible area.It covers 10.72% of the total study area.Great care should be taken when planning development activities in moderately susceptible areas.Areas with co-seismic landslide occurrence probabilities of 0.6-0.8 are classified as high susceptibility areas, and the remaining areas with co-seismic landslide probabilities of 0.8 to 1 are classified as very high susceptibility areas.A total of 7.52% and 4.12% of the total study area are classified as high and very high susceptibility classes, respectively.It is recommended to avoid these areas for the development of human settlement and infrastructure.The percentage of landslides in each susceptibility class is shown in Fig. 11.
It is observed from the susceptibility map that the high and very high susceptibility areas are in the central part of the study area.Furthermore, these regions are located in the higher part of the mountain, north of the predicted rupture zone.As a result, the slope above and north of the rupture plane is more susceptible to landslides than the slope south of the plane.

Model validation
The area under the receiver operating characteristic (ROC) curve, the Kappa coefficient, and overall accuracy are often used to compare and evaluate the results obtained from ML algorithms; other statistical approaches, such as (5) max t ij for j in j min t ij for j in j − (9) L − = min t ij for j in j + max t ij for j in j − (10) root mean square error (RMSE), are also popular statistical indices for the evaluation of model performance 16 .In this study, overall accuracy, ROC curves, the Kappa coefficient, and RMSE are used for the evaluation of model performance.The information on model evaluation is summarized in Table 2.
Table 2 shows that the overall accuracy of the ERT ML model considering the twelve causative factors, including the three seismic factors, is 87.2%, the kappa coefficient is 0.744, the area under the ROC curve is 0.941 (Fig. 12) and the RMSE value is 0.358.The area under the ROC curve of 0.941 represents excellent model performance.The 0.744 kappa coefficient shows strong model performance with the causative factors considered.

District wise susceptibility analysis
The study area is very large; therefore, the susceptibility details cannot be observed clearly within the whole susceptibility map; hence, further investigation is carried out district wise as well as for highly susceptible local units.From the district wise analysis, it is observed that the most susceptible district to earthquake-induced landslides is Sindhupalchok.One-fourth of the district falls into the very high or high susceptibility class.Based on the percentage area under the high or very high susceptibility class, Rasuwa, Dhading, Nuwakot, Gorkha and Dolakha followed the Sindhupalchok district.Detailed information is given in Fig. 13.The distribution and the landslide inventory show that the landform above or in the northern part of the surface of the rupture is more susceptible to the landslide.In Fig. 13, it is identified that the Sindhupalchok district is the most susceptible district.It requires very meticulous analysis and planning for carrying out developmental activities; therefore, further analysis and study are carried out at the local unit level.The area under different susceptibility classes in different local units is given in Fig. 14.
From Fig. 14, it is observed that in Sindhupalchok district, the most susceptible local unit is Barhabise municipality, with 19.98% and 20.34% of its area under the high and very high susceptibility classes, respectively, inferring that more than 40% of the Municipality falls under the very high or high susceptibility class.The susceptibility map of the Sindhupalchowk district and the Barhabise municipality is shown in Fig. 15.
Figure 15 clearly shows the areas susceptible to high and very high landslides in and around Barhabise municipality.It is recommended to avoid these areas susceptible to high and very high landslides for developmental activities such as rural road construction, water resource management, hydropower projects, etc., and for settlements.

Analysis of building exposure to co-seismic landslides
To analyze the buildings exposed to co-seismic landslide susceptibility, a building footprint map is overlaid on the co-seismic susceptibility map, as shown in Fig. 16.By using the zonal statistics method in QGIS, the buildings in each susceptibility class are counted, and most of the buildings lie in the very low and low susceptibility   In the Sindhupalchok district, 2483 buildings, or 3.06% of the total building count, fall into the "high" or "very high" classes.Similarly, 6.1% of the buildings in the Barhabise Municipality fall into these susceptibility classes (Fig. 18).The ratio is double that of the district, which is remarkably similar to the percentages in the region that fall into these susceptibility classes, namely, the district (24%) and the Barhabise (40%).According to this exposure distribution, as the area of high susceptibility increases, so does the building's exposure, posing significant risks to lives and property as well as long-term socioeconomic losses.Ward level vulnerability mapping is carried out in Barhabise municipality.The data was acquired from national population and housing census 2021.The analysis was carried out following the method mentioned in "Vulnerability assessment" section above and the result is shown in Fig. 19.
For Barhabise municipality, vulnerability score varies from 0.101 to 0.687.The risk map was classified in to five classes with equal interval.The risk map revealed that, the ward is located in high risk zone with 106 number of building exposed to high and very high-risk class.The detailed outcome of the vulnerability and risk assessment are shown in Figs.20 and 21.

Conclusion
Using Extremely Randomized Trees for analysis, the overall accuracy of the model is 87.2%, the area under the ROC curve is 0.94, the kappa coefficient is 0.744 and the RMSE value is 0.358, which indicates that the performance of the model is excellent with the considered causative factors.Therefore, district wise susceptibility shows that the Sindhupalchowk district is the most susceptible district, with 14.14% of the area under the high and 9.68% of the area under the very high susceptibility classes.Furthermore, the most susceptible local unit in Sindhupalchowk is Barhabise municipality, with 19.98% and 20.34% of its area under the high and very high susceptibility classes, respectively.To analyze the exposure of buildings to co-seismic landslide susceptibility, the building footprint map is overlayed on the landslide susceptibility map.From the analysis, the Sindhupalchowk district has the highest number, that is, 2483 buildings in high and very high susceptibility classes, and the  Dhading district has the lowest number of buildings, that is, 159 buildings in high and very high susceptibility classes.Similarly, the most susceptible local unit is found as Barhabise Municipality, where nearly 40% of the area falls under the high and very high susceptibility classes.The building count that falls under these categories is 586, which is also the highest in a single local unit.While, the risk analysis using susceptibility mapping, considering socio-economic factors and structural vulnerability in the municipality, indicated that only 106 households, constituting 1.1% of the total 9591, were identified as high and very high risk.
Furthermore, till date, many studies have been conducted in Nepal for landslide susceptibility mapping that are focused on a small area, but this is the first study to evaluate co-seismic landslide susceptibility and building exposure analysis considering a large area using a machine learning approach.Similarly, local level risk mapping and exposure analysis is carried out in this study which can provide a critical and handy information from the policy makers to the local people understand the risk and possible threat imposed by the co-seismic landslide.This study can be a reference for further research on co-seismic susceptibility mapping and postseismic landslide hazard mitigation considering different earthquake scenarios in Nepal as well as throughout the world with conditioning factors similar to those in this study.

Figure 3 .
Figure 3. Landslide inventory map with training and validation data sets.

Figure 6 .
Figure 6.Anthropogenic landslide causative factors (a) distance from the road, (b) land use and land cover.

Figure 7 .
Figure 7. Geological setting of the study area.

( 1 )
This model has better accuracy, as the results are determined by the maximum number of votes.(2) It is efficient in handling a large number of multidimensional input data sets.(3) It can also rank features during algorithm implementation by variable importance.

Figure 8 .
Figure 8. Building footprint map over the study area [source: open street map (OSM)].

Figure 9 .
Figure 9. Flow chart of extremely randomized tree model.

Figure 11 .
Figure 11.Percentage area and landslides in each susceptibility class in the study area.

Figure 12 .
Figure 12.Area under the ROC curve.

Figure 13 .
Figure 13.Area under different susceptible classes in different districts.

Figure 14 .
Figure 14.The area under different susceptibility classes in each local unit in Sindhupalchok district.

Figure 16 .
Figure 16.Building overlay on the co-seismic landslide susceptibility map.

Figure 17 .
Figure 17.Building counts over the study area for each district.

Figure 19 .
Figure 19.Vulnerability and risk map of Barhabise municipality with building footprint.

Figure 20 .
Figure 20.Building counts in different risk classes.

Figure 21 .
Figure 21.Vulnerability score with susceptibility class area in each ward of Barhabise municipality.

Table 1 .
Co-seismic landslide causative factors for this study.
Vol:.(1234567890) Scientific Reports | (2024) 14:5902 | https://doi.org/10.1038/s41598-024-54898-wwww.nature.com/scientificreports/Distance to faults Faults are discontinuities in the rock mass that are generated by tectonic or seismic activities.The areas around the fault are more prone to landslides as the rocks in the proximity are disturbed and are generally weak.Faults